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COMMUNITY DETECTION 
IN TEMPORAL MULTILAYER NETWORKS, 

AND ITS APPLICATION TO CORRELATION NETWORKS* * * § 

MARYA BAZZit, MASON A. PORTER*, STACY WILLIAMS^, MARK MCDONALD®, 
DANIEL J. FENN®, AND SAM D. HOWISON^ 


Abstract. 

Networks are a convenient way to represent complex systems of interacting entities. Many 
networks contain “communities” of nodes that are more densely connected to each other than to nodes 
in the rest of the network. In this paper, we investigate the detection of communities in temporal 
networks represented as multilayer networks. As a focal example, we study time-dependent financial- 
asset correlation networks. We first argue that the use of the “modularity” quality function—which 
is defined by comparing edge weights in an observed network to expected edge weights in a “null 
network”—is application-dependent. We differentiate between “null networks” and “null models” 
in our discussion of modularity maximization, and we highlight that the same null network can 
correspond to different null models. We then investigate a multilayer modularity-maximization 
problem to identify communities in temporal networks. Our multilayer analysis only depends on 
the form of the maximization problem and not on the specific quality function that one chooses. 
We introduce a diagnostic to measure persistence of community structure in a multilayer network 
partition. We prove several results that describe how the multilayer maximization problem measures 
a trade-off between static community structure within layers and higher values of persistence across 
layers. We also discuss some implementation issues that the popular “Louvain” heuristic faces with 
temporal multilayer networks and suggest ways to mitigate them. 

Key words. Community structure, multilayer networks, temporal networks, modularity maxi¬ 
mization, financial correlation networks. 
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1. Introduction. In its simplest form, a network is simply a graph: it consists 
of a set of nodes that represent entities and a set of edges between pairs of nodes 
that represent interactions between those entities. One can consider weighted graphs 
(in which each edge has an associated edge weight that quantifies the interaction of 
interest) or unweighted graphs (weighted graphs with binary edge weights). Networks 
provide useful representations of complex systems across many disciplines . Com¬ 
mon types include social networks (which arise via offline and/or online interactions), 
information networks (e.g., hyperlinks between webpages in the World Wide Web), 
infrastructure networks (e.g., transportation routes between cities), and biological 
networks (e.g., metabolic interactions between cells or proteins, food webs, etc.). 

Given a network representation of a system, it can be useful to apply a coarse- 
graining technique in order to investigate features that lie between features at the 
“microscale” (e.g., nodes and pairwise interactions) and the “macroscale” (e.g., total 
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edge weight and degree distribution) |4|5|. One thereby studies “mesoscale” features 
such as core-periphery structure and (especially) community structure. Loosely speak¬ 
ing, a community (or cluster) in a network is a set of nodes that are “more densely” 
connected to each other than they are to nodes in the rest of the network 23,55 . 
Giving a precise definition of “densely connected” is, of course, necessary to have a 
method for community detection. It is important to recognize at the outset that this 
definition is subjective and in particular, may depend on the application in question. 
Correspondingly, community detection methods may need to be tailored. We restrict 
ourselves to hard partitions, in which each node is assigned to exactly one community, 
and we use the term “partition” to mean “hard partition”. It is also important, but 
beyond the scope of this paper, to consider “soft partitions”, in which communities 


can overlap 23 33 52 55 


Analysis of community structure has been very useful in a wide range of applica- 

In social networks, communities 


tions; many of which are described in 23 25 49 55 


can reveal groups of people with common interests, places of residence, or other sim¬ 
ilarities 50,66 . In biological systems, communities can reveal functional groups that 


29 
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are responsible for synthesizing or regulating an important chemical product 
In the present paper, we use financial-asset correlation networks as examples^ 
Despite the diversity of markets, financial products, and geographical locations, finan¬ 
cial assets can exhibit strong time-dependent correlations, both within and between 
asset classes. It is a primary concern for market practitioners (e.g., for portfolio di¬ 
versification) to estimate the strengths of these correlations and to identify sets of 


assets that are highly correlated 43,67 


Most methods for detecting communities are designed for static networks. How¬ 
ever, in many applications, entities and/or interactions between entities evolve in time. 
In such applications, one can use the formalism of temporal networks, where nodes 
and/or their edges weights vary in time [31| . This is important for numerous applica¬ 
tions, including person-to-person communication 
semination (e.g., Twitter networks 


neuroscience 


27 


[^, ecology 31 , finance 


68 , one-to-many information dis¬ 


and Facebook networks 
20H22l|51|, and more. 


70 ), cell biology 31 


Two main approaches have been adopted to detect communities in time-dependent 
networks. The first entails constructing a static network by aggregating snapshots of 
the evolving network at different points in time into a single network (e.g., by taking 
the mean or total edge weight for each edge across all time points, which can be prob¬ 
lematic if the set of nodes varies in time and which also makes restrictive assumptions 
on the interaction dynamics between entities [30| ). One can then use standard network 
techniques. The second approach entails using static community-detection techniques 
on each element of a time-ordered sequence of networks at different times or on each 
element of a time-ordered sequence of network aggregation^ (computed as above) 
over different time intervals (which can be either overlapping or nonoverlapping) and 
then tracking the communities across the sequence [^ [^[^[^|4^[^ . 

A third approach consists of embedding a time-ordered sequence of networks in a 
larger network 18 44|[54 . Each element of the sequence is a network layer, and nodes 
at different time points are joined by inter-layer edges. This approach was introduced 


^One needs to distinguish between this kind of aggregation and the averaging of a set of time 
series over a moving window to construct a correlation matrix, which one can then interpret as a 
fixed-time snapshot of a time-evolving network. Although both involve averaging over a time window, 
the former situation entails averaging a network, and the latter situation entails averaging over a 
collection of time series (one for each node) with no directly observable edge weights. 
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in 44 and the resulting network is a type of multilayer network 11 35 . The main 


difference between this approach and the previous approach is that the presence of 
nonzero inter-layer edges introduces a dependence between communities identified in 
one layer and connectivity patterns in other layers. Thus far, most computations 
that have used a multilayer representation of temporal networks have assumed that 
inter-layer connections are “diagonal” (i.e., they exist only between copies of the same 
node) and “ordinal” (i.e., they exist only between consecutive layers) [^. Diagonal 
is a natural model of the persistence of node identity in time, while ordinal preserves 
the time ordering. 


The authors of 44 derived a generalization of modularity maximization^ a popular 


clustering method for static networks, to multilayer networks. Modularity is a function 
that measures the “quality” of a network partition into disjoint sets of nodes by 
computing the difference between the total edge weight in sets in the observed network 
and the total expected edge weight in the same sets in a “null network” generated 
from some “null model” 23 55 . Modularity maximization consists of maximizing 


the modularity quality function over the space of network partitions. (In practice, 
given the combinatorial complexity of this maximization problem, one uses some 
computational heuristic and finds a local maximum 28 .) Intuitively, the null model 
controls for connectivity patterns that one anticipates finding in a network, and one 
uses modularity maximization to identify connectivity patterns in an observed network 
that are stronger than anticipated. We give a precise definition of the modularity 
function for single-layer networks in Section where (importantly) we distinguish 
between a “null network” and a “null model” in modularity maximization. In Section 
1^ we discuss the choice of null network for a given application. 

In Section we describe the generalization of single-layer modularity to mul¬ 
tilayer networks proposed in 44 . To date, almost no theory has explained how a 


multilayer partition obtained with zero inter-layer coupling (which reduces to single¬ 
layer modularity maximization on each layer independently) differs from a multilayer 
partition obtained with nonzero inter-layer coupling. In Section we prove several 
theoretical properties of an optimal solution for the multilayer maximization problem 
to better understand how such partitions differ and how one can exploit this differ¬ 
ence in practice. We also describe two implementation issues that arise when using 


the popular Louvain heuristic 10 to solve the multilayer maximization problem, and 


we suggest ways to mitigate them. The results of Section are independent of the 
choice of quality function on individual layers and only depend on the form of the 
maximization problem. Section contains a concluding discussion. 

2. Single-layer modularity maximization. 

2.1. The modularity function. Consider an Wnode network Q and let the 
edge weights between pairs of nodes be {Aij\i,j S {!,... ,./V}}, so that A = {Aij) S 
'RN'^n jg adjacency matrix ot Q. In this paper, we only consider symmetric 
adjacency matrices (and hence undirected networks), so Aij = Aji for all i and j. 
The strength of a node i is 


N 


N 


ki — ^ Aij — ^ A 




( 2 . 1 ) 


1=1 


i=i 


and it is given by the i*** row (or column) sum of A. 

When studying the structure of a network, it is useful to compare what is observed 
with what is anticipated. We define a null model to be a probability distribution on 
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the set of adjacency matrices and a null network to be the expected adjacency ma¬ 
trix under a specified null model. In a loose sense, null models play the role of prior 
models, as they control for features that one anticipates to find in the system under 
investigation. One can thereby take into account known (or suspected) connectiv¬ 
ity patterns that might obscure unknown connectivity patterns that one hopes to 
discover via processes like community detection. For example, in social networks, 
one often takes the strength of a node in a null network to be its observed strength 
ki 


47 .49 551 . We discuss the use of this null network for financial-asset correlation 


networks in Section In spatial networks that represent the spread of a disease 
or information between different locations, some authors have used null networks in 
which edge weights between two locations scales inversely with the distance between 
them 


19 60 


As we discussed in Section one uses modularity maximization to partition a 
network into sets of nodes called “communities” that have a larger total internal edge 
weight than the expected total internal edge weight in the same sets in a null network, 
generated from some null model [23[|49[[50||55|. Modularity maximization consists of 


finding a partition that maximizes this difference 23 55 . (As mentioned earlier, 
in practice one uses some computational heuristic and finds a local maximum [28| ). 
In the present paper, we use the term “modularity” for an arbitrary choice of null 
network and we ignore any normalization constant that depends on the choice of null 
network but does not affect the solution of the modularity-maximization problem for 
a given null network. Modularity thus acts as a “quality function” Q : C —>■ M, where 
the set C is the set of all possible iV-node network partitions. 

Suppose that we have a partition C of a network into K disjoint sets of nodes 
{Cl,..., Ck}- We can then define a map c(-) from the set of nodes {!,..., N} to the 
set of integers {!,..., K} such that c{i) = c{j) = k if and only if nodes i,j lie in Ck- 
We call c{i) the set assignment (or community assignment when C is a global or local 
maximum) of node i in partition C. The value of modularity for a given partition C 
is then 


N 

Q{C\A-,P) := ^ {A, - PASA,Cj), (2.2) 

*j=i 

where P = {Pij) S is the adjacency matrix of the null network, Ci is short¬ 

hand notation for c{i), and 6(ci,Cj) is the Kronecker delta function. We state the 
modularity-maximization problem as follows: 

N 

c:tc E {A,j - Pij)S{ci,Cj), (2.3) 

i.i=l 


which we can also write as maxcgc Q(C|S) or maxcgc 
A — P is the so-called modularity matrix [47| 
contributions to modularity are only counted when two nodes are assigned to the same 
set. These contributions are positive (respectively, negative) when the observed edge 
weight Aij between nodes i and j is larger (respectively, smaller) than the expected 
edge weight Pij between them. If Aij < Pij for all i and j, then the optimal solution 
is N singleton communities. Conversely, if Aij > Pij for all i and j, then the optimal 
solution is a single A^-node community. To obtain a partition of a network with a high 
value of modularity, one hopes to have many edges within sets that satisfy Aij > Pij 


S{ci,Cj), where B = 
It is clear from (2.31 that pairwise 
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and few edges within sets that satisfy Aij < Pij. As is evident from equation (2.3), 
what one regards as “densely connected” in this setting depends fundamentally on 
the choice of null network. 

It can be useful to write the modularity-maximization problem using the trace of 
matrices [47]. As before, we consider a partition C of a network into K sets of nodes 


{Cl, ..., Ck}- We define the partition matrix S € (0,1} 

Sij = , 


NxK 


as 


(2.4) 


where j S {1,..., K} and Ci = j means that node i lies in Cj. The columns of S 
are orthogonal, and the column sum of S gives the number of nodes in Cj. This 
yields 

N N K 

Y, B,,6{c,,c,) = Y = Tt{S^BS) , 

i,j — l — k—1 

where the term of S^BS is twice the sum of edge weights in Ci. (The (i, 

off-diagonal term is twice the sum of edge weights between Ci and Cj.) It follows that 
one can restate the modularity-maximization problem in (2.3) as 


23 


maxTr(5^BS'), (2.5) 

S^S 

where S is the set of all partition matrices in (0, (with K < N). 

Modularity maximization is one of myriad community-detection methods 
and it has many limitations (e.g., a resolution limit on the size of communities 
and a huge number of nearly degenerate local maxima |28| ). Nevertheless, it is a 
popular method (which has been used successfully in numerous applications |^[^ ), 
and the ability to specify explicitly what one anticipates is a useful (and under¬ 
exploited) feature for users working on different applications. In Section]^ we make 
some observations on one’s choice of null network when using the modularity quality 
function. 

2.2. The Louvain computational heuristic. For a given modularity matrix 
B, a solution to the modularity-maximization problem is guaranteed to exist in any 
network with a finite number of nodes. However, the number of possible partitions 
in an A^-node network, given by the Bell number [^, grows at least exponentially 
with N^ so an exhaustive search of the space of partitions is infeasible. Modularity 
maximization was proven in |I4| to be an NP-hard problem (at least for the null 
networks which we consider in this paper), so solving it requires the use of computa¬ 
tional heuristics. In the present paper, we focus on the Louvain heuristic, which is a 
locally-greedy modularity-increasing sampling process over the set of partitions 


10 


The Louvain heuristic consists of two phases, which are repeated iteratively. Ini¬ 
tially, each node in the network constitutes a set, which gives an initial partition that 
consists of N singletons. During phase I, one considers the nodes one by one (in some 
order), and one places each node in a set (including its own) that results in the largest 
increase of modularity. This phase is repeated until one reaches a local maximum (i.e., 
until one obtains a partition in which the move of a single node cannot increase mod¬ 
ularity) . Phase 2 consists of constructing a reduced network Q' from the sets of nodes 
in Q that one obtains after the convergence of phase I. We denote the sets in Q at 
the end of phase I by {Ci,... ,C (where N < N) and the set assignment of node i 
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in this partition by q. Each set Ck in Q constitutes a node fc in and the reduced 
modularity matrix of Q' is 

B' = S^BS, 


where S is the partition matrix of {Ci,... This ensures that the all-singleton 

partition in Q' has the same value of modularity as the partition of Q that we identified 
at the end of phase 1. One then repeats phase 1 on the reduced network and continues 
iterating until the heuristic converges (i.e., until phase 2 induces no further changes). 

Because we use a nondeterministic implementation of the Louvain heuristic—in 
particular, the node order is randomized at the start of each iteration of phase 1 —the 
network partitions that we obtain for a fixed modularity matrix can differ across runsj^ 
To account for this, one can compute the frequency of co-classification of nodes into 
communities for a given modularity matrix B across multiple runs of the heuristic 
instead of using the output partition of a single run. (See for an application 
of such an approach to “consensus clustering” and [59| for an application of such 
an approach to hierarchical clustering.) We use the term association matrix for a 
matrix that stores the mean number of times that two nodes are placed in the same 
community across multiple runs of a heuristic, and we use the term co-classification 
index of nodes i and j to designate the {i,jY^ entry of an association matrix. 

There are many other heuristics that one can employ to maximize modularity 
but the Louvain heuristic is a popular choice in practice 38 . It is very 


23,48 55 


fast 1 ^, which is an important consideration in multilayer networks, for which 
the total number of nodes is the number of nodes in each layer multiplied by the 
number of layers. In Section we point out two issues that the Louvain heuristic 
(independently of how it is implemented) faces with temporal multilayer networks. 

2.3. Multiscale community structure. Many networks include community 

and some systems even have a hierarchical com- 
In such a situation, although there 


structure at multiple scales 55 
munity structure of 


parts-within-parts” 62 


are dense interactions within communities of some size (e.g., friendship ties between 
students in the same school), there are even denser interactions in subsets of nodes 
that lie inside these communities (e.g., friendship ties between students in the same 
school and in the same class year). Some variants of the modularity function have 
been proposed to detect communities at different scales. A popular choice is to scale 
the null network by a resolution parameter 7 > 0 to yield a multiscale modularity- 


maximization problem 58 : 


max 

Cgc 


N 

E 




( 2 . 6 ) 


In some sense, the value of the parameter 7 determines the importance that one 
assigns to the null network relative to the observed network. The corresponding 
modularity matrix and modularity function evaluated at a partition C are B = 

^The implementation [T| |34] of the heuristic that we use in this paper is a generalized version of 
the implementation in [10| . It is independent of the null network—so it takes the modularity matrix 
as an input to allow an arbitrary choice of null network—and it randomizes the node order at the 
start of each iteration of the heuristic’s first phase to increase the search space of the heuristic. When 
one chooses the same null network that was assumed in [10] and uses a node order fixed to {1,..., A^} 
at each iteration of phase 1 (the value of N can change after each iteration of the heuristic’s second 
phase), then these implementations return the same output. 
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A - 7 P and Q{C\A] P; 7 ) = Y^fj^i{Aij - ^Pij)5{ci, Cj). The special case 7 = 1 
yields the modularity matrix and modularity function in the modularity-maximization 
proble m (|2.3 ). This formulation of multiscale modularity has a dynamical interpre¬ 
tation ^ 37 that we will discuss in the next subsection. 

In most applications of community detection, the adjacency matrix of the ob¬ 
served and null networks have nonnegative entries. In these cases, the solution to 
( 2 . 6 ) when 0 < 7 < 7 “ = min^^^ i^ijlPij) is a single community regardless of 


any structure, however clear, in the observed network, because then 

Bij = Aij — 'yPij <0 for all i,jG{l...N}. 

(We exclude diagonal terms because a node is always in its own community.) How¬ 
ever, the solution to (2.6) when 7 > y'*' = {Aij/Pij) is N singleton 

communities because 


Bij = Aij — 'yPij <0 for all i,jG{l...N}. 

Partitions at these boundary values of 7 correspond to the coarsest and finest possible 
partitions of a network, and varying the resolution parameter between these bounds 
makes it possible to examine a network’s community structure at intermediate scales. 
For an observed and/or null network with signed edge weights, the intuition be¬ 


hind the effect of varying 7 in (2.6) on an optimal solution is not straightforward. A 


single community and N singleton communities do not need to be optimal partitions 
for any value of 7 > 0. In particular, Bij has the same sign as Aij for sufficiently 
small values of 7 , and Bij has the opposite sign to Pij for sufficiently large values of 
7 . For further discussion, see Section where we explore the effect of varying the 
resolution parameter on an optimal partition for an observed and null network with 
signed edge weights. We take 7 “ = 0 in numerical experiments with signed networks 
because optimal partitions can vary in the interval 7 S [ 0 , 7 “]j^ 

It is important to differentiate between a “resolution limit” on the smallest com¬ 
munity size that is imposed by a community-detection method [24| and inherently 
multiscale community structure in a network 23 55 62 . For the formulation of mul¬ 


tiscale modularity in (2.6), the resolution limit described in 24 applies to any fixed 


value of 7 . By varying 7 , one can identify communities that are smaller than the limit 
for any particular 7 value. In this sense, multiscale formulations of modularity help 
“mitigate” the resolution limit, though there remain issues [2 28,36 . In this paper. 


we do not address the issue of how to identify communities at different scales, though 
we note in passing that the literature includes variants of multiscale modularity (e.g., 
see D- We make observations on null networks in Section and we illustrate how 
our observations can manifest in practice using the formulation of multiscale modu¬ 
larity in ( |2.6[ ). (Our observations hold independently of the formulation of multiscale 
modularity that one adopts, but the precise manifestation can be different for different 
variants of multiscale modularity.) 

We use the term multiscale community structure to refer to a set Ciocai( 7 ) of local 
optima that we obtain with a computational heuristic for a set of (not necessarily 
all distinct) resolution-parameter values 7 = { 71 ,..., 7 ;}, where 7 “ = 71 < •. ■ < 
7 ; = 7 +. We use the term multiscale association matrix for an association matrix 


®For 7 > 7 +, one can show that all modularity contributions no longer change signs; these are 
negative (respectively, positive) between pairs of nodes with Pij > 0 (respectively, Pij < 0). 























A G [0,that stores the co-classification index of all pairs of nodes for partitions 
in this set: 


Aij — 


E 


CgCic 


.iW 


S(ci, Cj) 


l^^local('T) I 


(2.7) 


We use this matrix repeatedly in our computational experiments of Section]^ 

2.4. Null models and null networks. In this section, we describe three null 
networks. We make several observations on the interpretation of communities that 
we obtain from Pearson correlation matrices using each of these null networks in the 
computational experiments of Section 

2.4.1. Newman-Girvan (NG) null network. A popular choice of null net¬ 
work for networks with positive edge weights is the Newman-Girvan (NG) null net¬ 
work, whose adjacency-matrix entries are Pij = kikj/{2m), where ki are the observed 
node strengths 46 50 . This yields the equivalent maximization problems 


max 

CeC 


N 

E 

*j'=i 


2m 


I S{Ci, Cj 


maxTr 

sgS 


A- 


kk^ 

2m 


( 2 . 8 ) 


where k = A1 is the N xl vector of node strengths and 2m = l^Al is the total edge 
weight of the observed network. This null network can be derived from a variety of null 
models. One way to generate an unweighted network with expected adjacency matrix 
kk’^/{2m) is to generate each of its edges and self-edges with probability kikj/{2m) 
(provided kikj < 2m for all i,j). That is, the presence and absence of edges and 


self-edges is a Bernoulli random variable with probability kikj/{2m) 12,13]. More 


generally, any probability distribution on the set of adjacency matrices that satisfies 
E( E^i ^ij) — (i-®-) the expected strength equals the observed strength, see for 

e.g., 1^) and V.{Wij) = f{ki)f{kj) for some real-valued function / has an expected 
adjacency matrix of E(Ty) = fcfc^/(2TO)|^ The adjacency matrix of the NG null 
network is symmetric and positive semidefinite. 

We briefly describe a way of deriving the NG null network from a model on time- 
series data (in contrast to a model on a network). The partial correlation corr(a, b \ c) 
between a and b given c is the Pearson correlation between the residuals that result 
from the linear regression of a with c and b with c, and it is given by 


corr(a, 6 | c) = 


corr(a, b) — corr(a, c)corr(&, c) 
■\/l — corr2(a, c)y/l — corr2(6, c) 


(2.9) 


Suppose that the data used to construct the observed network is a set of time series 
{zi\i G {1, ■. ■ ,-N}}, where Zi = {zi{t)\t G T} and T is a discrete set of time points. 
The authors in 41 pointed out that when Aij = co'ri{zi, Zj) then ki = C0Y{zi, Ztot) 
and thus that 


= corr(zi,ztot)corr(%,.gtot), (2-10) 

■^The linearity of the expectation and the assumptions — 

f{ki)f{kj) imply that f(ki) = fi^j) ~ y/2m. Combining these equations 

gives the desired result. 
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where Zi{t) = {zi{t) — {zi))/a{zi) is a standardized time series and ztot(t) = ^ii^) 

is the sum of the standardized time seriesj^ Taking a = Zi, b = Zj, and c = Ztot, 
equation (2.9) implies that if corr(zi, | ztot) = 0 then corr(zi,Zj) = kikj/{2m). 
That is, Pearson correlation coefficients between pairs of time series that satisfy 
cori{zi,Zj |ztot) = 0 are precisely the adjacency-matrix entries of the NG null net¬ 
work. One way of generating a set of time series in which pairs of distinct time 
series satisfy this condition is to assume that each standardized time series depends 
linearly on the mean time series and that residuals are mutually uncorrelated (i.e., 
Zi = aiZtot/N + /3i + Ci for some pi gM. and corr(ei, Cj) = 0 for z ^ j). 


The multiscale modularity-maximization problem in ( |2.6[ ) was initially introduced 
58 using an ad hoc approach. Interestingly, one can derive this formulation of 


the maximization problem for sufficiently large values of 7 by considering a quality 


function based on a continuous-time Markov process X (t) on an observed network 36 


37 . The probability density of a continuous-time Markov process with exponentially 


distributed waiting times at each node parametrized by A(z) satisfies 


p = pAM — pA , 


( 2 . 11 ) 


where the vector p{t) G [0,1]^^'^ is the probability density of a random walker at 
each node [i.e., Pi{t) := P(X(t) = i) for each i], A is a diagonal matrix with the rate 
X{i) on its z**' diagonal entry, and M is the transition matrix of a random walker 


(i.e., Mij := Aijjki). The solution to equation (2.11) is p{t) = and its 

stationary distribution is tt = fe 
function defined by 


A ^/{2m). The stability of a partition is a quality 


17 36 37 


r(5',t) = Tr 


— TT^Tr'l S 


where = S{i,j)7Ti. Equivalently, the stability is 


N 


-{c,t)=Y. 




6{ci, 


( 2 . 12 ) 


Taking pg = tt , the term in brackets on the left-hand side of ( 2.12[ ) is P(A(0) = z n 
X{t) = j) and the term in brackets on the right-hand side is P(A(0) = znA(t —)■ oo) = 
j) (provided the system is ergodic). The intuition behind the stability quality function 
is that a good partition at a given time before reaching stationarity corresponds to 
one in which the time that a random walker spends within communities is large 
compared with the time that it spends transiting between communities. In other 
words, a random walker that starts out at a community ends up there again in the 
early stages of the random walk, long before stationarity. The resulting maximization 
problem is max^g^ r(S, t), or equivalently maxcgc''’(C'j i)- By linearizing 
at t = 0 and taking A = J, one obtains the multiscale modularity-maximization 


problem in (2.6) at short timescales with 7 = 1/t and Pij = kikj /(2m). This approach 
provides a dynamical interpretation of the resolution parameter 7 as the inverse (after 
linearization) of the time used to explore a network by a random walker. 


®The equality l |2.10| l holds for signed correlation networks. The strength of a node i is given by 
the P*' (signed) column or row sum of the correlation matrix. 
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2.4.2. Generalization of Newman-Girvan null network to signed net¬ 
works (NGS). In 26 , Gomez et al. proposed a generalization of the NG null 
network to signed networks. They separated A into its positive and negative edge 
weights: 


A = A+-A-, 

where A'^ denotes the positive part of A and —A~ denotes its negative part. Their 
generalization of the NG null network to signed networks (NGS) is Pij = k^'/{2mA)— 
k~kj /{2m~). This yields the maximization problem 


N 

max > 

cec ^ 

ij = l 


' A* -’’-AL 

2 m+ 



<5(ci, Cj). 


(2.13) 


where k/' and 2m~^ (respectively, k~ and 2m~) are the strengths and total edge weight 
in (respectively, A~). The intuition behind this generalization is to use an NG 
null network on both unsigned matrices A'*’ and A~ but to count contributions to 


modularity from negative edge weights (i.e., the second group of terms in (2.131) in 
an opposite way to those from positive edge weights (i.e., the first group of terms in 
(2.13)). Negative edge weights that exceed their expected edge weight are penalized 
(i.e., they decrease modularity) and those that do not are rewarded (i.e., they increase 
modularity). One can generate a network with edge weights 0, 1, or —1 and expected 
edge weights kf k^/{2m~^) — k~kj/{2m~) by generating one network with expected 
edge weights W/j = kfk'^ /(2 to+) and a second network with expected edge weights 
W/~ = k~k~/{2m~) using the procedure described for the NG null network in Section 
|2.4.1[ and then defining a network whose edge weights are given by the difference 
between the edge weights of these two networks. More generally, any probability 
distribution on the set of signed adjacency matrices {W G . 
properties as those for the NG null network for and W 
W~ defined as above) will have expected edge weights of Wij = k/'k^/{2m'^) — 
k~k~/{2m~) for all i,j £ {1,... ,N} (by linearity of the expectation). 

The authors of [44| derived a variant of the multiscale formulation of modularity 


the same 
(where W = W+ - 


in (2.6) for the NGS null network at short time scales by building on the random-walk 
approach used to derive the NG null networklj They considered the function 


+tAii{Mij - 6,j)] - Tr,piij^S{ci,Cj), (2.14) 




where the term in brackets on the left-hand side is a linearization of the exponential 
term in (2.12), M and tt^ are as defined in (2.12) on a network with adjacency matrix 
|A| := and Pi\j is the probability of jumping from node i to node j at 


stationarity conditional on the network structure 44 . If the network is unipartite 


unsigned, and undirected, then pi\j reduces to the stationary probability tt^. 


®In particular, they derived the multiscale formulation of modularity obtained using a Potts- 
model approach in [64| . This multiscale formulation results in one resolution parameter 71 for the 
term (k'^k'^)/(2m'^) and a second resolution parameter 72 for the term {k~ k~)/{2ra~) in ( |2.13[ l 
(see [42| for an application of this multiscale formulation to the United Nations General Assembly 
resolution networks). Without an application-driven justification for how to choose these parameters, 
this increases the parameter space substantially, so we only consider the case 71 = 72 in this paper. 
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2.4.3. Uniform (U) null network. A third null network that we consider is 
a uniform (U) null network, with adjacency-matrix entries = (fc)^/(2m), where 
(^) •= (Efci ^i) denotes the mean strength in a network. We thereby obtain the 
equivalent maximization problems 


N 

max > 
Cec ^ 




d(ci, Cj) ^ maxTr 
ses 





(2.15) 


where A is an unsigned adjacency matrix and Ijv is an x iV matrix in which every 
entry is l[^ The expected edge weight in (2.151 is constant and satisfies 


(fc)2 (Z]i=i 2m 

2 m ~ 


where (A) denotes the mean value of the adjacency matrixj^ One way to generate 
an unweighted network with adjacency matrix (Ajljv is to generate each edge with 
probability (A) (provided (A) < 1). That is, the presence and absence of an edge 
(including self-edges) are independent and identically distributed (i.i.d.) Bernoulli 
random variables with probability (A). More generally, any probability distribution 
on the set of adjacency matrices that satisfies IE(X]i j=i ^ij) = 2 m and E(lUy) = 
E(Wi'j/) for all i,j,i',j' has an expected adjacency matrix E(VK) = (Ajl^r. The 
adjacency matrix of the U null network is symmetric and positive semidefinite. One 
can derive the multiscale formulation in (2.6) for the U null network from the stability 
quality function in precisely the same way as it is derived for the NG null network, 
except that one needs to consider exponentially distributed waiting times at each node 
with rates proportional to node strength (i.e., A^- = 6{i, j)ki/{k)) [^. 


3. Multilayer modularity maximization. 

3.1. Multilayer representation of temporal networks. We restrict our 
attention to temporal networks in which only edges vary in time. (Thus, each 
node is present in all layers.) We use the notation Ag for a layer in a sequence 
of adjacency matrices T = {Ai,..., A| 7 -|}, and we denote node i in layer s by 
ig. We use the term multilayer network for a network defined on the set of nodes 
{li,..., Ai; I 2 ,..., A 2 ;...; l|r|, • ■ •, -^inl [^ - 

Thus far, computations that have used a multilayer framework for temporal net¬ 
works have almost always assumed ( 1 ) that inter-layer connections exist only between 
nodes that correspond to the same entity (i.e., between nodes ig and A for some i 
and s ^ r) and (2) that the network layers are “ordinal” (i.e., inter-layer edges exist 
only between consecutive layers) [7| [^[44l[4^[^ . It is also typically assumed that (3) 
inter-layer connections are uniform (i.e., inter-layer edges have the same weight). In a 
recent review article on multilayer networks [35| , condition (1) was called “diagonal” 
coupling, and condition (2) implies that a network is “layer-coupled”. We refer to the 


^For a network in which all nodes have the same strength, the uniform and Newman-Girvan null 
networks are equivalent because ki = kj for all i,j ki = 2mjN = (fc) for all i. This was pointed 
out for an application to foreign exchange markets in [20||21|. 

^Although we use the uniform null network on unsigned adjacency matrices in this paper, the 
expected edge weight in the uniform null network is always nonnegative for correlation matrices, as 
positive semidefiniteness guarantees that (A.) = 1^A1/(A1^) > 0. 
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Fig. 3.1. Example of (left) a multilayer network with unweighted intra-layer connections (solid 
lines) and uniformly weighted inter-layer connections (dashed curves) and (right) its corresponding 
adjacency matrix. (The adjacency matrix that corresponds to a multilayer network is sometimes 
called a “supra-adjacency matrix” in the network-science literature \3^.) 


type of coupling defined by (1), (2), and (3) as ordinal diagonal and uniform inter¬ 
layer coupling and we denote the value of the inter-layer edge weight by a; G K. We 
show a simple illustration of a multilayer network with ordinal diagonal and uniform 
inter-layer coupling in Fig. |3.1| One can consider more general inter-layer connections 
(e.g., nonuniform ones). Although we restrict our attention to uniform coupling in 
our theoretical and computational discussions, we give an example of a nonuniform 
choice of inter-layer coupling in Section]^ Results similar to those of subsection |5.2| 
also apply in this more general case. 


3.2. The multilayer modularity function. The authors of 44 generalized 


the single-layer multiscale modularity-maximization problem in (2.6) to a multilayer 
network using a similar approach as the one used to derive the NGS null network 
from a stochastic Markov process on the observed network. For simplicity, we express 
intra-layer and inter-layer connections in an A^|T|-node multilayer network using a 
single iV|T| x A^|T| matrix. Each node is in layer s has the unique index i' := 
z -|- (s — l)iV, and we use A. to denote the multilayer adjacency matrix, which has 
entries Apji = AijsS{s,r) -|-a;i5(|s — r|,l) when the inter-layer coupling is ordinal 
diagonal and uniform. (As discussed in [^, one can use either an adjacency tensor 


or an adjacency matrix to represent a multilayer network.) The generalization in 44 
consists of applying the function in (2.14) to the A^|T|-node multilayer network: 


N\T\ , 

f{C,t)= ^ - dij)] 


'^iPi\j G) ' 


(3.1) 


where C is now a multilayer partition (i.e., a partition of an iV|T|-node multilayer 
network), A is the N\T\ x N\T\ diagonal matrix with the rates of the exponentially 
distributed waiting times at each node of each layer on its diagonal, A4 (with en¬ 
tries Mij := Aij/J2j-^ij) is the N\T\ x N\T\ transition matrix for the A|T|-node 
multilayer network with adjacency matrix A, tti is the corresponding stationary distri¬ 
bution (with the strength of a node and the total edge weight now computed from the 
multilayer adjacency matrix . 4 .), and pi\j is the probability of jumping from node i to 
node j at stationarity conditional on the structure of the network within and between 
layers. The authors’ choice of Pi\j^ which accounts for the sparsity pattern of inter¬ 
layer edges in the multilayer network, leads to the multilayer modularity-maximization 
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problem 


N\T\ 

™tc E BijS{ci,Cj). (3.2) 

which we can also write as maxcgc ( 5 (C'|iB), where 3 is the multilayer modularity 
matrix 


B = 


Bi uil 
(jJI 
0 

0 


ujI 


0 ijjI B 


irU 


(3.3) 


and Bs is a single-layer modularity matrix computed on layer s. (For example, 
Bg = Ag — {Ag)lj^ if one uses the U null network and sets 7 = 1.) We rewrite the 
multilayer modularity-maximization problem in 44 as 


\T\ N 

maxcGC EE 

s=l i,j — l 


iri-1 N 

,CjJ + 2uj 

S =1 2=1 


‘"*3+1 ) I 


(3.4) 


where Bijg denotes the {ijjY^ entry of Bg. Equation (3.4) clearly separates intra-layer 


contributions (left term) from inter-layer contributions (right term) to the multilayer 
quality function. 

In practice, one can solve this multilayer modularity-maximization problem with 
the Louvain heuristic in subsection |2.2| by using the multilayer modularity matrix 3 
instead of the single-layer modularity matrix B as an input (the number of nodes 
in the first iteration of phase 1 becomes iV|T| instead of N). It is clear from (3.41 
that inter-layer merges decrease the value of the multilayer quality function when 
(jj < 0, so we only consider a; > 0. Furthermore, although merging pairs of nodes 
with Bij = 0 into the same set does not change the value of the quality function, 
we will assume in the rest of the paper that sets in globally optimal partitions of 


the multilayer modularity-maximization problem (3.4) do not contain disconnected 


components in the A^|T|-node weighted graph with adjacency matrix 3. 

In Section we try to gain some insight into how to interpret a globally optimal 
multilayer partition by proving several properties that it satisfies. The results that we 
show are independent of the choice of matrices Bi ,..., . B 17 - 1 , so (for example) they 
still apply when one uses the stability quality function in ( 2 . 12 ) on each layer instead 


of the modularity quality function. For ease of writing (and because modularity is 
the quality function that we use in our computational experiments of Section]^, we 
will continue to refer to the maximization problem (3.4) as a multilayer modularity 
maximization problem. 

4. Interpretation of community structure in correlation networks with 
different null networks. It is clear from the structure of 3 in equation (3.3) that 
the choice of quality function within layers (i.e., diagonal blocks in the multilayer 
adjacency matrix) and the choice of coupling between layers (i.e., off-diagonal blocks) 
for a given quality function affect the solution of the maximization problem in (|3.4[). In 











14 


this section, we make some observations on the choice of null network for correlation 
networks when using the modularity quality function. To do this, we consider the 


multilayer modularity-maximization problem (3.41 with zero inter-layer coupling (i.e., 
w = 0 ), which is equivalent to performing single-layer modularity maximization on 
each layer independently. 

4.1. Toy examples. We describe two simple toy networks to illustrate some 


features of the NG (2.8) and NGS (2.13) null networks that can be misleading for 


asset correlation networks. 


4.1.1. NG null network. Assume that the nodes in a network are divided 
into K nonoverlapping categories (e.g., asset classes) such that all intra-category edge 
weights have a constant value a > 0 and all inter-category edge weights have a constant 
value 5, with 0 < b < a. Let Hi denote the category of node i, and rewrite the strength 
of node i as 


ki = \Ki\a + {N — \Ki\)h = |Ki|(a — b) + Nb . 

The strength of a node in this network scales linearly with the number of nodes in its 
category. Suppose that we have two categories ki, K 2 that do not contain the same 
number of nodes. Taking |ki| > \k 2 \ without loss of generality, it follows that 


PiJ&Ki 


1 

2 m 


|Ki|(a -&)-!- Nb 


> 


1 

2 m 


|K 2 |(a — &) + Nb 




(4.1) 


where is the expected edge weight between pairs of nodes in Ki in the NG 

null network. That is, pairs of nodes in an NG null network that belong to larger 
categories have a larger expected edge weight than pairs of nodes that belong to 
smaller categories. 


To see how equation (4.1) can lead to misleading results, we perform a simple 


experiment. Gonsider the toy network in Fig. |4.1[ a) that contains 100 nodes divided 
into four categories of sizes 40, 30, 20, and 10. We set intra-category edge weights 
to 1 and inter-category edge weights to 0.3 (i.e., a = 1 and b = 0.3 in equation 


(4.1)). In Fig. |4.lK b) (respectively. Fig. |4.1| (c)), we show the multiscale association 
matrix defined in \2.7\ using an NG null network (respectively, a U null network). 
Golors scale with the frequency of co-classification of pairs of nodes into the same 
community across resolution-parameter values. Because the nodes are ordered by 


category, diagonal blocks in Fig. 4.1 'b,c) indicate the co-classification index of nodes 


in the same category, and off-diagonal blocks indicate the co-classification index of 
nodes in different categories. We observe in Fig. |4.1[ b) that larger categories are 
identified as a community across a smaller range of resolution-parameter than smaller 
categories when using an NG null network. In particular, category k is identified as 
a single community when 7 < a/Pij^K (with a/Pij^K^ < a/Pij^K 2 when |ki| > \k 2 \ 
by equation (4.1)). When 7 > ajPij^K., category k is identified as |/c| singleton 
communities. However, we observe in Fig. |4.1[ c) that all four categories are identified 
as a single community across the same range of resolution-parameter values when 
using the U null network. In particular, category k is identified as a single community 
when ■j < aj{A) and as |k| singleton communities when 7 > a/(A). 

The intuition behind multiscale modularity maximization is that the communities 
that one obtains for larger values of 7 reveal “more densely” connected nodes in the 


observed network. Although all diagonal blocks in Fig. 4.1 ’a) have the same internal 














15 


20 

40 

60 

80 

100 


1 

0.3 


1 1 

0.3 

7 


20 40 60 80 100 



(a) Unsigned adjacency 
matrix 


20 40 60 80 100 

(b) Multiscale 
association matrix for 
the NG null network 


20 40 60 80 100 

(c) Multiscale 
association matrix for 
the U null network 


0.5 



(d) Signed adjacency 
matrix 


(e) Multiscale 
association matrix for 
the NGS null network 


(f) Multiscale 
association matrix for 
the U null network 


Fig. 4.1. (a) Toy unsigned block matrix with constant diagonal and off-diagonal blocks that take 
the value indicated in the block, (b) Multiscale association matrix of (a) that gives the frequency 
of co-classification of nodes across resolution-parameter values using an NG null network, (c) 
Multiscale association matrix of (a) that uses a U null network, (d) Toy signed block matrix with 
constant diagonal and off-diagonal blocks that take the value indicated in the block, (e) Multiscale 
association matrix of (d) that uses an NGS null network, (f) Multiscale association matrix of (d) 
that uses a U null network. For the NG and U (respectively, NGS) null networks, our sample of 
resolution-parameter values is the set { 7 “,..., 7 “*”} (respectively, { 0 ,..., 7 "*’}) with a discretization 
step of 10 “^ between each pair of consecutive values. 


connectivity, different ones are identified as communities for different values of 7 when 
using the NG null network—as 7 increases, nodes in the largest category split into 
singletons first, followed by those in the second largest category, etc. One would need 
to be cautious in using multiscale community structure to gain information about 
connectivity patterns in the observed network in this example. 


4.1.2. NGS null network. A key difference between an NG null network (2.8) 
and an NGS null network (2.13) is that the expected edge weight between two nodes 
must be positive in the former but can be negative in the latter. Gonsider a signed 
variant of the example in Section Em in which intra-category edge weights equal a 
constant a > 0 and inter-category edge weights equal a constant b <0. The strengths 
of node i in the k**' category are 


7 = bk 9,nd fcj =(1V—|/c|)&. 


We consider two categories ki, K 2 with different numbers of nodes. Taking |/ci| > \k 2 \ 
without loss of generality, it follows that 
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where is the expected edge weight between pairs of nodes in Ki in the NGS 

null network. As was the case for an NG null network, pairs of nodes in an NGS 
null network that belong to larger categories have a larger expected edge weight than 
pairs of nodes that belong to smaller categories. 

However, the fact that the expected edge weight can be negative can further 
complicate interpretations of multiscale community structure. A category k for which 
Pijeit < 0 Pi^Kj^K. > 0 is identified as a community when Aij < —^Pij for 
all G K (this inequality must hold for sufficiently large 7 because Pij^K < 0 ) 
and does not split further for larger values of 7 . This poses a particular problem 
in the interpretation of multiscale community structure obtained with the NGS null 
network because nodes with negative expected edge weights do not need to be “densely 
connected” in the observed network to contribute positively to modularity. In fact, if 
one relaxes the assumption of uniform edge weights across categories, one can ensure 
that nodes in the category with lowest intra-category edge weight will never split. This 


is counterintuitive to standard interpretations of multiscale community structure 36 


In Fig. 4.1 A,e), we illustrate the above feature of the NGS null network using a 
simple example. The toy network in Fig. |4.1[ d) contains 100 nodes divided into three 
categories: one of size 50 and two of size 25. The category of size 50 and one category 
of size 25 have an intra-category edge weight of 1 between each pair of nodes. The 
other category of size 25 has an intra-category edge weight of 0.4 between each pair 
of nodes. All inter-category edges have weights of —0.05. (We choose these values 
so that the intra-category expected edge weight is negative for the third category 
but positive for the first two and so that inter-category expected edge weights are 
positive.) We observe in Fig. 4.1'e) that the first and second categories split into 
singletons for sufficiently large 7 , that the smaller of the two categories splits into 
singletons for a larger value of the resolution parameter, and that the third category 
never splits. Repeating the same experiment with the U null network in Fig. |4.lKf) 


(after a linear shift of the adjacency matrix to the interval [ 0 , 1 ], i.e., Aij f (A^- -|- 1 ) 
for all i,j), we observe that the co-classification index of nodes reflects the value of 
the edge weight between them. It is highest for pairs of nodes in the first and second 
category, and it is lowest for pairs of nodes in the third category. 


4.2. Data sets. We show how the features discussed in Section ItT] can manifest 
in real data. We use two data sets of financial time series for our computational 
experiments. The first is a multi-asset data set and consists of weekly price time series 
for N = 98 financial assets during the time period 01 Jan 99-01 Jan 10 (resulting 
in 574 prices for each asset). The assets are divided into seven asset classes: 20 
government bond indices (Gov.), 4 corporate bond indices (Corp.), 28 equity indices 
(Equ.), 15 currencies (Gur.), 9 metals (Met.), 4 fuel commodities (Fue.), and 18 
commodities (Com.). This data set was studied in using principal component 
analysis and a detailed description of the financial assets can be found in that paper. 

The second data set is a single-asset data set that consists of daily price time 
series for N = 859 financial assets from the Standard & Poor’s (S&P) 1500 during 
the time period 01 Jan 99-01 Jan 13 (resulting in 3673 prices for each asset)j^ The 
financial assets are all equities and are divided into ten sectors: 62 materials, 141 
industrials, 150 financials, 142 information technology, 55 utilities, 47 consumer sta¬ 
ples, 138 consumer discretionary, 48 energy, 68 health care, and 6 telecommunication 


®We consider fewer than 1500 nodes because we only include nodes for which data is available 
at all time points to avoid issues associated with choices of data-cleaning techniques. 
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(a) Data set 1: Surface plot of correlations over 
all 238 time windows 


(b) Data set 2: Surface plot of correlations over 
all 854 time windows 


0 


0.1 


0.2 


(b) 


Fig. 4.2. Surface plots of the correlations over all time windows for (a) the first data set and 
the second data set. The colors in each panel scale with the value of the observed frequency. 


services. 

The precise way that one chooses to compute a measure of similarity between pairs 
of time series and the subsequent choices that one makes (e.g., uniform or nonuniform 
window length, and overlap or no overlap if one uses a rolling time window) affect 
the values of the similarity measure. There are myriad ways to define similarity 
measures—the best choices depend on facets such as application domain, time-series 
resolution, and so on—and this is an active and contentious area of research 56 61 


63 69 . Constructing a similarity matrix from a set of time series and investigating 


community structure in a given similarity matrix are separate problems, and we are 
concerned with the latter in the present paper. Accordingly, in all of our experiments, 
we use Pearson correlation coefficients for our measure of similarity. We compute them 
using a rolling time window with a uniform window length and uniform amount of 
overlap. 

We adopt the same network representation for both data sets. We use the term 
time window for a set of discrete time points and divide each time series into overlap¬ 
ping time windows that we denote by 7” = {Tg}. The length of each time window | T| 
and the amount of overlap between consecutive time windows \T\ — 5t are uniformly 
We fix (|T|,i5t) = (100,2) for the first data set (which amounts to roughly two years 
of data in each time window) and {\T\,6t) = (260,4) for the second data set (which 
amounts to roughly one year of data in each time window). Every network layer with 
adjacency matrix Ag is a Pearson correlation matrix between the time series of loga¬ 
rithmic returns during the time window Tg. For each data set, we study the sequence 
of matrices 


{Age[-i,ipx^|se{i,...,|r|}}. 

We show a surface plot of the observed frequency of correlations in each layer for each 
data set in Fig. (4.21. 


^^The amount of overlap determines the number of data points that one adds and removes from 
each time window. It thus determines the number of data points that can alter the connectivity 
patterns in each subsequent correlation matrix (i.e., each subsequent layer). 
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Fig. 4.3. Multiscale association matrix for the U, NG, and NGS null networks for the entire 
correlation matrix and a subset of the correlation in the last layer of the first data set. In panel 
(a), we show the entire matrix; in panels (b,c,d), we show the multiscale association matrix that 
we obtain from this matrix using each of the three null networks. In panel (e), we show the first 
35 X 35 block of the correlation matrix from panel (a); and in panels (f,g,h), we show the multiscale 
association matrix that we obtain from this subset of the correlation matrix using each of the three 
null networks. The colors scale with the entries of the multiscale association and the entries of the 
correlation matrix. Black squares on the diagonals correspond to government and corporate bond 
assets, and white squares correspond to equity assets. 


4.3. Multiscale community structure in asset correlation networks. We 

perform the same experiments as in Fig. |4.1| on the correlation matrices of both 
data sets. Our resolution-parameter sample is the set {7“,..., 7+} (respectively, 
{0,..., 7 '*'}) for the U and NG (respectively, NGS) null networks with a discretization 
step of the order of 10“^. We store the co-classification index of pairs of nodes 
averaged over all resolution-parameter values in the sample. We use the U and NG 
null networks for a correlation matrix that is linearly shifted to the interval [0,1]. For 
each null network, we thereby produce \T\ multiscale association matrices with entries 
between 0 and 1 that indicate how often pairs of nodes are in the same community 
across resolution-parameter values. 


We show the multiscale association matrices for a specific layer of data set 1 in 
The matrix in Fig. 4.3 a) corresponds to the correlation matrix during the 

this matrix reflects 


Fig. 4.3 


interval 08 Feb 08-01 Dec 10. In accord with the results in 22 


the increase in correlation between financial assets that took place after the Lehman 
bankruptcy in 2008. (One can also see this feature in the surface plot of Fig. |4.2[ a).) 
The matrices in Fig. |4.3[ b,c,d) correspond, respectively, to the multiscale association 
matrix for the U, NG, and NGS null networks. We reorder all matrices (identically) 
using a node ordering based on the partitions that we obtain with the U null network 
that emphasizes block-diagonal structure in the correlation matrix. We observe that 
the co-classification indices in the multiscale association matrix of Fig. |4.3[b) are a 
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better reflection of the strength of correlation between assets in Fig. 4.3 a) than the 


multiscale association matrices in Fig. |4.3[ c,d). As indicated by the darker shades of 
red in the upper left corner in Fig. |4.3[ c,d), we also observe that the government and 
corporate bond assets (black squares on the diagonal) remain in the same community 
across a larger range of resolution-parameter values than the equity assets (white 
squares on the diagonal). In fact, when we use an NGS null network, the expected 
weight between two government or corporate bonds is negative (it is roughly — 0 . 1 ), 
and these assets remain in the same community across arbitrarily large values of 
the resolution parameter. One would need to be cautious in using the multiscale 
association matrices in Fig. |4.3[ c,d) to gain insight about the connectivity between 
assets in Fig. 4.3'a). 


When studying correlation matrices of multi-asset data sets, one may wish to vary 
the size of the asset classes included in the data (e.g., by varying the ratio of equity 
and bond assets). We show how doing this can lead to further misleading conclusions. 
By repeating the same experiment using only a subset of the correlation matrix (the 
first 35 nodes), we consider an example where we have inverted the relative sizes of the 
bond asset class and the equity asset class. As indicated by the darker shades of red in 
the lower right corner in Fig. |4.3[ g,h), equity assets now have a higher co-classification 
index than government and corporate bond assets. If one uses the co-classification 
index in the multiscale association matrices of Fig.|4^c,d) (respectively, Fig. |4.3[ g,h)) 
to gain information about the observed correlation between equity and bond assets in 


Fig. 4.3 a) (respectively. Fig. |4.3K e)), one may draw different conclusions despite the 
fact that these have not changed. However, the multiscale association matrix with 
a U null network in Fig. |4.3[ f) reflects the observed correlation between equity and 
bond assets in Fig. 4.3 'e)|^^| 


To quantify the sense in which a multiscale association matrix of one null network 
“reflects” the values in the correlation matrix, we compute the Pearson correlation 
between the upper triangular part of each multiscale association matrix and its cor¬ 
responding adjacency matrix across all time layers of both data sets for the U, NG, 
and NGS null networks. We show these correlation plots in Fig. |4.4[ Observe that the 
correlation between the adjacency and multiscale association matrix in Fig. |4.4[ a,b) 
is highest in each layer for the U null network and lowest in (almost) each layer for 
the NGS null network. 

The above observation can be explained as follows. Recall from (2.5) that we 
can write the modularity-maximization problem as maxgg^ Tr(S'^BS'), where S is 
the set of partition matrices. When one uses a U null network, the entries of the 
modularity matrix are the entries of the adjacency matrix shifted by a constant 7 (A), 
and the quality function reduces to 


m|x [Tr(5'^AS') - 7 (A)||c(S')|| 2 ] 


(4.2) 


where ||c( 5)||2 = ||Tr(S'^lArS')| I 2 is the 2-norm of the vector of set sizes in S' (i.e., c(S) 


is the vector whose fc*** entry is i^ik)- It follows that modularity maximization 


^^The authors of [65] showed that a globally optimal partition for a null network called the 
“constants Potts model” (CPM), in which the edge weights are given by a constant that is independent 
of the network, is “sample-independent”. Their result can be generalized as follows for the U null 
network (in which expected edge weights are constant but are not independent of the observed 
network). Suppose that Cmax is a partition that maximizes Q(C\A] P\ 71 ) and consider the subgraph 
induced by the network on a set of communities Ci,..., Q E Cmax- Then {C 1 UC 2 ■ • -UC^} maximizes 
Q{C\A; P; 72 ), where A is the adjacency matrix of the induced subgraph and 72 = 'yi{A) / (A). For 
the CPM null network, the same result holds with 71 = 72 • 
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(a) Correlation between 
multiscale association matrix and 
adjacency matrix (data set 1) 


(b) Correlation between 
multiscale association matrix and 
adjacency matrix (data set 2) 


Fig. 4.4. Correlation between the adjacency matrix and the multiscale association matrix for the 
U (solid curve), NG (dashed curve), and NGS (dotted curve) null networks over all time layers for 
(a) data set 1 and (b) data set 2. We compute the Pearson correlation coefficients between entries in 
the upper diagonal blocks in each matrix (to avoid double counting, as the matrices are symmetric), 
and we exclude diagonal entries (which, by construction, are equal to 1 in both matrices). 


with a U null network is equivalent to a block-diagonalization of the adjacency matrix 
A (the first term in (4.2)) with a penalty on the size of communities (the second 
term). As one increases the resolution parameter, one favors smaller sets of nodes 


with stronger internal connectivity. Note that one could also apply equation (4.2) on 
adjusted adjacency matrices = A — A. For example, one can let A be a matrix 
that controls for random fluctuations in a correlation matrix A (e.g., the “random 
component” in 41 ). 

For a general null network, equation ( |4.2[ ) takes the form 
max [Tr(S^AS) - Tr(S^( 7 P)S)] , 

where P is the adjacency matrix of the null network. That is, modularity maximiza¬ 
tion finds block-diagonal structure in A (first term) that is not in 7 P (second term). 
It is common to avoid using the U null network in applications because “it is not 
a good representation of most real-world networks” 47 . The extent to which one 


wants a null network to be a good representation of an observed network depends 
on the features in an application for which one wants to control. We argue that 
whether an NG null network is more appropriate than a U null network for a given 
situation depends at least in part on one’s interpretation of node strength for that 
application. As we discussed in Section 2.4 the strength of a node in correlation 
matrices is given by the covariance between its standardized time series and the mean 
time series. When using the NG null network, it thus follows that pairwise differences 

and 


— in the modularity quality function depend on corr(ii, zj), corr(£j 


A/), 


corr(zfe, Ztot), where k S i ,j }, the quantity Zi is the standardized time series of 


asset i defined in subsection 2.4.1 and ztot = When using the U null network, 

pairwise differences in the modularity quality function depend only on the observed 
edge weights corr(zi,z^) and corr(z 7 , ). In contrast, the term corr(zfe, ztot) intro¬ 

duces a dependence between the communities that one finds with the NG null network 
and the extent to which nodes in those communities are representative [as measured 
by corr(zfc, -Stot)] of the mean time series for the sample. In situations in which one 
may wish to vary one’s node sample (e.g., by changing the size of asset classes), one 
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needs to bear such dependencies in mind when interpreting the communities that one 
obtains. 


5. Effect of inter-layer coupling on a multilayer partition. In Section]^ 
we set the inter-layer connection weights to 0 in the multilayer network. The solution 
to the multilayer modularity-maximization problem (3.41 then depends solely on the 
values in the modularity matrix of each time layer, and the multilayer modularity- 
maximization problem reduces to performing single-layer modularity maximization 
on each layer independently. 

Recall the multilayer modularity-maximization problem 


iri N \T\-1 N 

maxcGC EE 

s — 1 i,j — l s—1 i—1 


A solution to this problem is a partition of an 7V|T|-node multilayer network. Its 
communities can contain nodes from the same layer and nodes from different layers. 
Nodes from different layers can be the same node at different times ((Is, v) with s ^ r) 
or different nodes at different times ((*s, jV) with i ^ j and s ^ r). We say that a 
node is remains in the same community (respectively, changes communities) between 
consecutive layers s and s -I- 1 if (5(ci^, = 1 (respectively, i5(ci^, = 0). 

Positive ordinal, diagonal, and uniform inter-layer connections favor nodes re¬ 
maining in the same community between consecutive layers. Every time a node does 
not change communities between two consecutive layers (i.e., = 1), a posi¬ 

tive contribution of 2a; is added to the multilayer quality function. One thereby favors 
communities that do not to change in time because community assignments are tran¬ 
sitive: if = 1 and = 1, then = 1. 

We define the persistence of a multilayer partition to be the total number of nodes 
that do not change communities between layers: 


iri-i AT 

Pers(C):= ^ 5(c.,, J S {0,..., iV(|r| - 1)} . 


(5.1) 


As indicated in equation (5.1), Pers(C') is an integer between 0, which occurs when no 


node ever remains in the same community across layers, and A^(|7~| — 1), which occurs 
when every node always remains in the same community. (See for a closely related 
measure called “flexibility” that has been applied to functional brain networks.) Let 
Pers(C')|s denote the number of nodes that remain in the same community between 
two consecutive layers s and s -|- 1: 


N 


Pers(C')|5 := e {0, 


(5.2) 


I ‘-j~\ _ 

so that Pers(C') = X^sJi Pers(C')|s. Persistence provides an insightful way of rewrit¬ 
ing the multilayer modularity-maximization problem: 

iri ;v 

maxcGC EE + 2a;Pers(C'). (5.3) 

S — 1 i,j — l 


The multilayer maximization problem thus measures a trade-off between static com¬ 
munity structure within layers (the first term in (5.31) and temporal persistence across 
layers (the second term in (|5.3|)). 
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To better understand the effect of nonzero inter-layer coupling on partitions one 
obtains without inter-layer coupling (i.e., w = 0), we introduce notation that helps 
to compare a multilayer partition to single-layer partitions. We denote by Afs ■= 
{ls,...,iVs} the set of nodes in layer s. The restriction of a set of nodes Ci C 

to a layer s is Ci\s := Ci n A4, and we 
define the partition induced by a multilayer partition C G C on layer s by 

Cl := {Cil,Ci&C}. 


We refer to a “globally optimal partition” as an “optimal partition” in this section 
for ease of writing. In the next two subsections, we illustrate how the set of partitions 
induced by a multilayer partition with w > 0 on individual layers can differ from 
intra-layer partitions obtained with cu = 0. 

5.1. Toy examples. 

5.1.1. Changes in connectivity patterns. This toy examples illustrates how 
inter-layer coupling can enable us to detect and differentiate between changes in con¬ 


nectivity patterns across layers. In Fig. 5.1 we show an unweighted multilayer network 
with |T| = 10 layers and N = 8 nodes in each layer. Every layer except for layers 3 
and 6 contains two 4-node cliques. In layer 3 , node 63 is connected to nodes {I3, 23 } 
instead of nodes {63, 73, 83}. In layer 6, node Se is connected to nodes {Ig, 26 , 3q, Ig} 
instead of nodes {6g,7g,86}. We show the layers of the multilayer network in pan¬ 
els (a)-(c) of Fig. |5.1| We examine its communities using a U null network with a 
resolution-parameter 7 = 1 . Layer s then has the following single-layer modularity 
matrix: 




1 — (As ), if z is connected to j 
— {As), otherwise. 


The optimal partition in each layer is unique and is Cs = {{Is, 2s, 3s, 4s}, {5 


s, 6s, 7s, 8s}} 

in layer s for s ^ {3,6} and is Cs = {{Is, 2s, 3s, 4s, 5s}, {6s, 7s, 8s}} in layers 3 and 
6 . When the value of inter-layer coupling is 0, the optimal multilayer partition is the 
union of |T| disconnected optimal single-layer partitions. The resulting multilayer 


partition, which we show in panel (d) of Fig. 5.1 has a persistence Pers(C') = 0. We 


denote this partition by Cg, where Cq = Cg. For any a; > 0, any partition with 
the same intra-layer partitions as Cg and a nonzero value of persistence yields a higher 
value of multilayer modularity than Cg. This follows immediately from the expression 
of the multilayer quality function: 


\r\ N 


Q{C\B) = 


■ 2a;Pers(C'). 


s=l i,j — l 


Increasing persistence without changing intra-layer partitions increases the second 
term of Q{C\B) without changing the first. (In Section 5.2 we prove that w > 0 
is both necessary and sufficient for an optimal partition to have a positive value of 
peristence.) To obtain the multilayer partition in panel (e), we combine all of the 
sets in panel (d) that contain Ig into one set and all of the sets that contain Ng into 
another set. This partition has a persistence equal to N{\T\ — 1) — 4, and any other 
way of combining the sets in Cg yields a lower value of persistence. 

Let’s examine Fig. |5.1| further. We now consider the multilayer partitions in 
panel (e), where both changes in network structure occurs; panel (f), where only 
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(a) Layer 
s 7^ 3, 6 


1^9') r9') (^9^ ('9'i 




0000 


r#^r#ir#^r^ 


(b) Layer 3 



(c) Layer 6 


(d) Partition Cq 


•#. . m 

(e) Partition Ci 






• • 


(f) Partition C 2 (g) Partition C 3 



(h) Change in multilayer modularity 
value with respect to partition Ci 


Fig. 5.1. Toy example illustrating the use of ordinal diagonal and uniform inter-layer coupling 
for detecting changes in community structure across layers. We consider ten layers (I'Tl = 10) with 
eight nodes (N = S) in each layer. We show the network structures in (a) layers s 0 {3,6}, (b) 
layer Z, and (c) layer 6. Panels (d)-(g) illustrate four dijferent multilayer partitions. In each panel, 
the s^^ column of circles represents the nodes in the s^^ layer, which we order from 1 to 8. We 
show sets of nodes in the same community using solid curves in panel (d) (to avoid having to use 20 
distinct colors) and using colors in panels (e)-(g). In panel (h), we show the difference between the 
multilayer modularity value between the partition in panels (f) (thin line) and (g) (thick line) and 
the partition in panel (e) for dijferent values of lo. We include the horizontal dotted line to show the 
point at which the thin line intercepts the horizontal axis. The panel labels in the regions defined by 
the area between two consecutive vertical lines in panel (h) indicate which of the multilayer partitions 
in panels (e), (f), and (g) has a higher value of modularity. 


the stronger change occurs; and panel (g), where neither change occurs. We denote 
these multilayer partitions by Ci,C 2 , and C 3 , respectively, and we note that Pers(Ci) < 
Pers(C 2 ) < Pers(C 3 ). The value uj of inter-layer coupling determines which partition of 
these three has the highest value of multilayer modularity. To see this, we compute the 
modularity cost of changing static community structure within layers in partition C\ in 
favor of persistence. (Such a computation is a multilayer version of the calculations for 
static networks in [28|.) The intra-layer modularity cost in C\ of moving node 5s from 
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the community {Ig, 2s, 3s, 4^, 5s} to the community { 6 s, 7s, 8 s} in layers s € {3,6} is 


AQ(s) —2( ^ B^js — ^ B^js 

VjG{6,7.8} JG{1,2.3.4} 

^ r -4 + 2(A)3«-3.3, ifs = 3 
\ -8 + 2(A)6 «-7.2, its = 6. 


The inter-layer modularity cost from this move is in both cases; the first factor of 
2 follows by symmetry of B, and the second factor of 2 follows from the fact that either 
move increases persistence by +2. Consequently, for 0 < 4w < AQ{3), the partition 
in panel (e) yields a higher multilayer modularity value than the partitions in (f) and 
(g). When AQ(3) < 4ci> < AQ( 6 ), the multilayer modularity value of the partition in 

(f) is higher than those of (e) or (g). Finally, when 4uj > Q(6), the partition in panel 

(g) has the highest multilayer modularity value. When 4lj = AQ(3) (respectively, 

4w = AQ( 6 )), the multilayer partition in panels (e) and (f) (respectively, (f) and (g)) 
have the same value of multilayer modularity. We illustrate these results in Fig. |5.1[ h) 
by plotting Q(C 2 |B) — Q(CilB) and Q^CslB) — Q{Ci\B) against to. This example is a 
simple illustration of how inter-layer connections can help distinguish between changes 
in connectivity patterns: stronger changes (in terms of modularity cost) persist across 
larger values of inter-layer coupling (see for other approaches to “change point 

detection” in temporal networks). 


5.1.2. Shared connectivity patterns. In the previous toy example, the intra¬ 
layer partitions induced on each layer by the multilayer partitions in Fig. 5.1 e,f,g) 
are optimal for at least one layer when w = 0 (see Fig. |5.1[ d)). This second example 
illustrates how inter-layer coupling can identify intra-layer partitions that are not 
optimal for any individual layer when w = 0 but which reflect connectivity patterns 
that are shared across layers. 

we consider an unweighted multilayer network with |7~| = 3 lay- 


In Fig. 5.2 


ers and iV = 13 nodes in each layer. Every s*" layer contains four 3-node cliques 
and a node that is connected to each of the three nodes in the s*** clique, and to 
nodes 10s and 12s in the 4*^*^ clique. We show the layers of the multilayer network in 
panels (a)-(c). We examine its communities using a U null network with a resolution- 
parameter value of 7 = 1. The optimal partition in each layer is unique and is 
{{li,2i,3i,13i},{4i,5i,6i},{7i,8i,9i},{10i,lli,12i}} for layer 1, {{ 12 , 22 , 32 }, 
{ 42 , 52 , 62 , 132 }, { 72 , 82 , 92 }, {IO 2 , II 2 , 122 }} for layer 2 , and {{I 3 , 23 , 83 }, { 43 , 63 , 63 }, 
{ 73 , 83 , 93 , 183 }, {IO 3 , II 3 , 123 }} for layer 3. We obtain the multilayer partition Ci in 
panel (d) by combining these sets such that induced intra-layer partitions are opti¬ 
mal for each layer when w = 0 and persistence is maximized between layers. The 
multilayer partition C 2 in panel (e) reflects connectivity patterns that are shared by 
all layers (i.e., node 13s is with the fourth 3-node clique instead of the 3-node 
clique); but its intra-layer partitions are not optimal for any layer when w = 0. By 
carrying out similar calculations to those in the previous toy example, one can show 
that when oj > 3/2the multilayer partition in panel (e) yields a higher modularity 
value than the multilayer partition in panel (d). We illustrate this result in Fig. |5.2} f) 
by plotting Q{C 2 \B) — Q{Cx\B) against w. This example is a simple illustration of how 
inter-layer connections can help identify connectivity patterns that are shared across 


when 4aj+6[2(l — (A}s)— (j4}s — 3(1 — (j4}s)] > 0, with (4l)i = (A) 2 = (A) 3 by construction 
in this example. 
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.... ...... 

5 i 4 -^ 6 i 32 « f—*22 / 52*-^62 33*^-*23 53*-^63 


^•13i 


lOi 


::r' 


(a) Layer 1 



72 


133 




(c) Layer 3 



(d) Partition Ci (e) Partition C 2 (f) Change in multilayer modularity 

value with respect to partition Ci 


Fig. 5.2. Toy example illustrating the use of ordinal diagonal and uniform inter-layer coupling 
for detecting shared connectivity patterns across layers. We consider three layers (\T'\ = with 
thirteen nodes (N = 13j in each layer. We show the network structures in (a) layer 1, (b) layer 
2, and (c) layer 3. Solid lines represent edges present in all three layers and dashed lines represent 
edges that are only present in one of the layers. Panels (d) and (e) illustrate two different multilayer 
partitions. In each panel, the s^^ column of circles represents the nodes in the s^^ layer, which we 
order 1 to 13. We show sets of nodes in the same community using colors in panels (d) and (e). 
In panel (f), we show the difference between the multilayer modularity value between the partition 
in panel (e) and the partition in panel (d) for different values of u. We include the horizontal 
dotted line to show the point at which the line intercepts the horizontal axis. The panel labels in the 
regions defined by the area between two consecutive vertical lines in panel (f) indicate which of the 
multilayer partitions in panels (d) and (e) has a higher value of multilayer modularity. 


layers. 

5.2. Some properties of multilayer partitions. We now ask how introducing 
positive ordinal diagonal and uniform coupling (i.e., w > 0) alters the set of maximum- 
modularity partitions of static networks (i.e., the case w = 0). To clearly differentiate 
between intra-layer and inter-layer modularity contributions, we denote the quality 
function by 


N 


QiC\Bi 


.B 


\T\ 




\T\ 


2wPers(C') 


s=l i,j—l 


instead of Q{C\B). Let Cmax(w) denote the set of optimal partitions for the multi¬ 
layer modularity-maximization problem (5.31, and let be an arbitrary partition 

in Cniax(w)- We prove several propositions that hold for an arbitrary choice of the 
matrices Bg (for example, if one uses the modularity quality function with a U null 
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network and a resolution parameter value of 1, then Bg = Ag — {As)!^). 
Proposition 5.1. Pers(C“„J > 0 w > 0. 


Proposition 5.1 ensures that as soon as (and only when) the value of uj is strictly 
positive, the value of persistence of an optimal solution is also positive. To prove 
this, it suffices to observe that if one rearranges sets in a multilayer partition by 
combining some of the sets into the same set without changing the partitions induced 
on individual layers, then one only changes the value of persistence in the expression 


of multilayer modularity. For example, this phenomenon occurs in Fig. 5.1 when going 
from the partition in panel (d) to the partition in panel (e). 


Proof. 

We prove the contrapositive. Assume that w = 0 and consider a multilayer 
partition C such that Pers(C') > 0. The partition C contains at least one set with 
disconnected components (because Pers(C') > 0 and nodes in different layers are not 
connected), and C is not optimal by our assumption on global optima in Section]^ 


Assume that w > 0 and consider a multilayer partition C such that Pers(C) = 0. 
We will show that C is not optimal. Let C = lji=i ^\s- K then follows that 

Q{C'\B ,,... ,B|ri; w) = Q(C|Bi,..., B^ry.uj). 

Choose a node v at random and let Ci^ denote the set in C that contains v. Now 
let C be the partition obtained from C by combining all sets that contain ig, for 
some s, into one set: 


// 

C 


. iri . , in 

[c • 

k S = 1 ^ S = 1 ^ 


Consequently, 

Q{C \Bi, ..., B\'f\\uj) > Q{C\Bi ,..., B|7-|; w) + 2w(|T| — 1), 

so C is not optimal. (Note that |T| > 2 for a network with more than a single layer, 
so 2u){T — 1) is strictly positive for w > 0.) □ 

Proposition 5.2. 


U Ci\r = 0 for some r S {1,..., |T| — 1}, then Ci\g = 0 for all s > r, 


where Ci S and S Cmax(d^)- 


Proposition 5.2 ensures that if a community becomes empty in a given layer, then 
it remains empty in all subsequent layers. We omit the proof as this result follows 
directly from the sparsity pattern of 3 and our assumption that optimal solutions do 
not contain disconnected components. 


Proposition 5.3. = ^maxU+i ^ Pers(C“,J|s = N. 

Proposition |5.3| connects the notion of persistence between a pair of layers to the 
notion of change in community structure within layers. Various numerical experiments 
that have been performed with ordinal diagonal and uniform inter-layer coupling 
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consist of varying the value of w and using information about when nodes change 
communities between layers as an indication of change in community structure within 
these layers [6{p^[44]. The equivalence relation in Proposition 5.3 motivates the use of 
Pers(C')|s (or a variant thereof) as an indication of intra-layer change in community 
structure. 


Proof. 

<=: This follows straightforwardly by transitivity of community assignments: if S{cj^, ) 

= 1 for all i,j, then 5{ci^,Cjf) = 1 if and only if = 1 for all 

i,j. (This direction holds for any multilayer partition; it need not be optimal.) 


=>: Let C be a multilayer partition that does not contain disconnected components 
such that C\s = 0s+i and Pers(C')|s < N for some s G {1,..., |T|}. We show that 
C is not optimalp^ Consider a set Ci € C such that Ci\s ^ 0. If i5(ci^, = 1 


(respectively, d{ci. 


= 0) for some is G Ci\s, then S{cj^ 


J = 1 (respectively, 


<5(cj^, = 0) for all js G Ci\s by transitivity of community assignments and be¬ 

cause C|s = Cls+i by hypothesis. Because Pers(C')|s < iV by hypothesis, there exists 
at least one set of nodes (7;|s, with Ci G C, such that i5(ci^,Ci^^i) = 0 for all is G Ci\s. 
Let Ck\s+i denote the set of nodes in layer s -|- 1 that contains is+i for all is G Ci\s. 
Consider the set of nodes Ur<sCi\r in C; that are in layers {1,..., s} and the set of 
nodes Ur.>sC'fc|r in Ck that are in layers {s -I- 1,..., |T|}. Because (5(ci^, J = 0 for 
all is G Ci,\s and by Proposition 5.2 it follows that Ci = Ur.<sCi\r ^^nd Ck = '^r>sCk\r- 
Define the partition C by 


c' = \ ({G} U {Cfe})^ U U Ck} 


This partition satisfies C'\r = C\r for all r G {!,..., |T|}, Pers(C")|r = Pers(C')|r 
for all r ^ s, and Pers(C")|s > Pers(C)|s. It follows that Q{C'\Bi,... > 

Q{C\Bx, ..., B\'f\\u}) and C is not optimal. □ 


Propositions Enin and |5.3| apply to an optimal partition obtained with any 
positive value of w. The next two propositions concern the existence of “boundary” 
values for w. 


Proposition 5.4. There exists wq > 0 such that 

\T\ 

if u! < loq , then [jc^JseC^aM- 

s^l 

Proposition |5.4| reinforces the idea of thinking of w as the cost of breaking static 
community structure within layers in favor of larger values of persistence across layers. 
It demonstrates that there is a positive value of inter-layer coupling such that for any 
smaller coupling, multilayer modularity maximization only gives more information 
than single-layer modularity maximization in that it identifies the set of partitions in 
Cmax(O) with highest persistence. The proof of this property relies on the fact that 
the set of possible modularity values for a given modularity matrix is finite. 

Proof 

Let C be an arbitrary partition such that ^ Cmax(O). We will show that there 

^^Imposing Pers(C)|s = is not sufficient because changing Pers(C)|s can change partitions 
induced on individual layers. 
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exists a value ujq of the inter-layer coupling parameter w such that C is never optimal 
for any inter-layer coupling below Wq. Given a sequence of single-layer modularity 
matrices the set of possible multilayer modularity values for a fixed 

value of w > 0 is finite and is given by 

Q^ = {Q(ClB^,...,Biri;u;),CeC} . 

Let Qq = max Qo, Ql = max Qo \ {Ooli and AQ — > 0. By hypothesis, 

where G Cniax(O). Furthermore, by definition of persistence, it follows that 

Q(C|Bi, ... + 2ujN[\T\ - 1 ) 

for all values of w. By choosing a; < wq, with wq = AQ/2N{\T\ — 1), we obtain 
Q(C|Bi, ..., Bi; w) < + 2u:N{\T\ - 1) < QI + AQ = QI , 


so C is not optimal for any inter-layer coupling below wq. □ 

Clearly, wq = AQ/2N{\T\ — 1) is not an upper bound for the set {w G K+ : 

Ui=i C'maxis € Cniax(O)} j^but our main concern is that the smallest upper bound of 
this set is not zero (in fact, we have shown that it must be larger than AQ/2N{\T\ — 
1 )> 0 .) 

Proposition 5.5. There exists Woo > 0 such that 

ifuj>w^, then Pers(C'(;(^,j,)|s = N for all s G {1,..., \T\} . 


Proposition |5.5| implies that a sufficiently large value of inter-layer coupling uj guar¬ 
antees that remains the same across layers (by Proposition 5.3). The proof of 

this proposition is similar to the proof of Proposition |5.4[ 


Proof 

Let C be an arbitrary partition of a multilayer network such that Pers((7)|s < N for 
some s G |T|}. We show that there exists a value Woo > 0 of the inter-layer 

coupling parameter u such that C is never optimal for lo > uJao- We first rewrite the 
quality function as 


Q{C\B ^,..., B\r\-.oj) = /3i + 2u:{N{\T\ - 1) - 4l), 

where j3i = and ^ > 1 because Pers(C') < N{\T\ — 1) by 

assumption. Now consider the set of values on the diagonal blocks of the multilayer 
modularity matrix B: 

Bdiag = G{1,...,N},sG{1,..., in}} , (5.4) 


^^For example, one could replace N(T — 1) in | |5.4[ | by N{\T\ — 1) — Pers(C(0)), where Pers(C(0)) 
denotes the maximum value of persistence that one can obtain by combining sets in each partition of 
Cmax(O) without changing the partitions induced on individual layers. Taking loq = AQ/ [2N{I'Tl — 


1) — Pers(C(0))] satisfies proposition 5.4 and AQ/[2N{\'T\ — 1) — Pers(C(0))] > AQ/2N{\'T\ — 1). 
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and let Max(Bdiag) and Min(Bdiag )5 respectively, denote the maximum and minimum 
values of the set Bdiag- Let C be any multilayer partition with a maximal value of 
persistence. It then follows that 

, S|r|;a;) = /32 + 2ujN{\T\ - 1) 

for some (32 G M. Because A> 1, choosing 

2uj > A^^[Max(Bdiag) - Min(Bdiag)] > Pi - 1^2 

ensures that C' yields a higher value of multilayer modularity than C for any di and 
for all Ae 7V(|r| - 1)}. □ 

The following proposition follows directly from proposition |5.5[ 

Proposition 5.6. There exists Woo > 0 such that 

( \ 

^max\s is a solution of maxQ I C\ 

V - / 


for all lu > Woo- 

Propositions |5.5| and |5.6| imply the existence of a “boundary value” for w above 
which single-layer partitions induced by optimal multilayer partitions (1) are the same 
on all layers and (2) are optimal solutions for the single-layer modularity-maximization 
problem defined on the mean modularity matrix. 


Proof. 

Suppose that to > to. 

Cmax(w). By Proposition 5.5 


and let G 


community assignments in C' 


oo, where Woo is as defined in Proposition |5.5| 

it then follows that Pers(C^ax) = ~ 1) 

are the same across layers. Consequently, for uj > Woo 


\T\ N 

max EE + 2a;Pers(C) 

s=l i,j 

\T\ N 

max EE B,,g6ic^,Cj) + 2cuNi\T\-l) 

s—1 iJ — 1 
N / \T\ \ 

^ etc E E Cj ), 

i,j \ s=l / 

where Ci denotes the community assignment of node i in all layers. □ 

The next two propositions formalize the intuition that an optimal multilayer 
partition measures a trade-off between static community structure within layers and 
persistence of community structure across layers. 

Proposition 5.7. Let wi > 012 > 0. For all G Cmax {! jJ2 ), one of the 
following two conditions must hold: 


(1) ^max £ Cmax{oJl) , 

or (2) Pers(C'^"^,,) < Peis{C^\J for all G Cmax(uJi) . 
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Proof. 

Let S Cmax(<j-’ 2 )- If ^ Cmax(wi), then condition (1) is satisfied. Suppose 

that ^ Cmax(wi), and assume that Pers(C“|x) ^ some G 

Cmax(wi). By definition of optimality, ^ Cmax(wi) implies that 


Q{CZ.\Bu ..., B|r|; wi) < Q(Crax|Si, ■ ■ ■, ^in;^i), (5-5) 

where wi > a ;2 by hypothesis. By writing 

iri N 

g(CZx|Si,---,S|r|;wfeO = E E + 2ccfc,Pers(C-L), 

S —1 —1 

where is the community assignment of node is in C^^x; and k,k' G {1,2}; and by 
substituting wi by a ;2 + A for some A > 0, one can show that the inequality (5.51 
implies 


■ ■ • ,-B|r|;'^2) < Q(C'mkxl-®i> • ■ • )-B|rha;2), 
which contradicts the optimality of C^^x- 

One can similarly prove the following proposition. 

Proposition 5.8. Let UJI > U !2 > 0. For all G Cmax{^ 2 ), one of the 
following two conditions must hold: 

(f) ^max G CmaxioJi) , 

or (2) ..., B|^|;0) > ..., B\r\\0) for all G C„„,(a;i). 


(To visualize Propositions 5.7 and 5.8 graphically, it is helpful to think of a multi¬ 
layer quality function Q{C\Bi ^..., B^q -^; uj) for a given partition C as a linear function 
of UJ with slope Pers(C') that crosses the vertical axis at Q{C\Bi,... ^ B\q-\-,Q) [see, 
for e.g., the last panel of Fig. 5.1 and Fig. 5.2 .) The next three corollaries follow 

The first states that the highest 


straightforwardly from Propositions |5.7| and |5.8| 
achievable value of persistence for an optimal partition obtained with a given value 
of inter-layer coupling is a non-decreasing function in uj. The second states that the 
highest achievable value of intra-layer modularity contributions for an optimal parti¬ 
tion obtained with a given value of inter-layer coupling is a non-increasing function 
in UJ. The third property states that if two distinct values of uj have the same set of 
optimal partitions, then this set is also optimal for all intermediate values. 

Corollary 5.9. Letuji > uj2. Then 

Pers(Cma2;(^l)) ^ P6I’s(Cmaa;(^2)) , 
where PeTs{Cmax{oj)) ■= max {Pers(C'“^ 2 ,), S Cmax{oj)}. 

Corollary 5.10. Letuji > uj2. Then 


Q {Cmax{oJl)) \Bi, . . . , S| 7 -|; 0) < Q {Cmax{oJ2)) \Bi, . . . , B| 7 -|; 0) , 
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Fig. 5.3. Toy example illustrating the effect of post-processing on a multilayer partition by 
increasing multilayer modularity via community-assignment swaps that increase the value of persis¬ 
tence but do not change intra-layer partitions. The colors in panels (a)-(c) scale with the entries of 
the adjacency matrix. Panel (d) (respectively, panel (e)) represents the output multilayer partition 
obtained with Louvain before (respectively, after) post-processing. The horizontal axis represents 
the layers, and the vertical axis represents the nodes. The shading in panels (d,e) represents the 
community assignments of nodes in each layer. 


where Q(Cmax(ix>)\Bi ,..., Bj-y-i , 0) .— max |,..., 51 - 7 - 1 , 0 ), C^ax ^ Cmaxiffi )}■ 


Corollary 5.11. Assume that Cm.ax{p’i) = Cmax(,p’ 2 ) for wi > ui-i. Then 
— Cmax{^) — Cm,ax{.^ 2 ) for all UJ G (^LU 2 ,UJi^ . 


One can extend the proofs of Propositions |5.1H5.7| so that they apply for inter¬ 
layer coupling that is uniform between each pair of contiguous layers but may differ 
from pair to pair. 

5.3. Implementation issues. We now examine issues that can arise when using 


the Louvain heuristic (see Section 2.2 1 to maximize multilayer modularity (3.3). 


5.3.1. Under emphasis of persistence. Consider the example network in Fig. 5.3 
which is a 3-layer network that has 5 nodes in each layer. Suppose that all nodes are 
strongly connected to each other in layers 1 and 3, and that the edge weight be¬ 
tween node I 2 and nodes { 22 , 32 , 42 , 52 } is smaller in layer 2 than the edge weight 
between node Ig and nodes {2s,3^,4g,5s} when s = 1,3. We use the uniform null 
network with 7 = 0.5 and set w = 0.1. This produces a multilayer modularity ma¬ 
trix in which all the single-layer modularity entries Bijs except those of node I 2 
are positive and exceed the value of inter-layer coupling. Suppose that one loops 
over the nodes ordered from 1 to iV|T| in the first phase of the Louvain heuristic. 
The initial partition consists of 1V|T| singletons, and each node is then moved to 
the set that maximally increases modularity. The partition at the end of phase 1 is 
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{{Ii,2i,3i,4i,5i,l2},{22,32,42,52},n3,23,33,43,53}}. In phase 2, the second and 
third sets merge to form a single setj^ and the Louvain heuristic gets trapped in a 
local optimum in which the smaller set of nodes (i.e., {li}) remains in the same com¬ 
munity across layers 1 and 2 and the larger set of nodes (i.e., {2i,3i,4i,5i}) changes 
community. We show this multilayer partition in Fig. |5.3[ d). Repeating this experi¬ 
ment 1000 times using a randomized node order at the start of each iteration of phase 
1 of the Louvain heuristic yields the same multilayer partition. One can modify this 
multilayer partition to obtain a new partition with a higher value of multilayer modu¬ 
larity by increasing the value of persistence across layers without changing intra-layer 
partitions (we use this idea in the proof of proposition 5.1). We show an example of 
this situation in Fig. |5.3K e). 

In Fig. |5.3[ d) we illustrate the above issue visually via abrupt changes in colors 
between layers (these are more noticeable in larger networks). Such changes are 
misleading because they imply a strong decrease in persistence that might not be 
accompanied by a strong change in intra-layer partitions. In Fig. 5.3 d), for example, 
the intra-layer partitions differ in the community assignment of only a single node. 
To mitigate this problem, we apply a post-processing function to all output partitions 
that increases persistence between layers without changing the partitions that are 
induced on each layer (thereby producing a partition with a higher value of multilayer 
modularity). We do this by relabeling the community assignments of nodes in each 
layer such that 1) the number of nodes that remain in the same community between 
consecutive layers is increased and 2) the partition induced on each layer by the 
original multilayer partition is unchanged. 


5.3.2. Abrupt drop in the number of intra-layer merges. The Louvain 
heuristic faces a second problem in multilayer networks. When the value of inter¬ 
layer coupling satisfies 


UJ > ]VIax(^cliag) 1 


(5.6) 


where Bdiag is the set of values on the diagonal blocks of B defined in equation 
(5.4), the inter-layer contributions to multilayer modularity are larger than the intra¬ 
layer contributions for all pairs of nodes. Consequently, only inter-layer merges occur 
during the convergence of phase 1 in the first iteration of the Louvain heuristic. In 
FigQa), we illustrate this phenomenon using data set 1. The mean number of 
intra-layer merges drops from roughly N = 98 (almost every node contains at least 
one other node from the same layer in its community) to 0. For uj values larger than 
Max(Bdiag)) every set at the end of the first iteration of phase 1 only contains copies 
of each node in different layers and, in particular, does not contain nodes from the 
same layer. This can yield abrupt changes in the partitions induced on individual 
layers of the output multilayer partition 

In Fig. |5.4[ c), we show an example using data set 1 of how the above issue 
can lead to an abrupt change in a quantitative measure computed from a multilayer 
output partition obtained with the Louvain heuristic. Note that the mean size of 
the sets after convergence of phase 1 in Fig |5.4| (a) is relatively small for data set 1. 
(The mean is 3 nodes per set, and the maximum possible number of nodes per set is 
|T| = 238.) Nevertheless, there is a sudden drop in the value of (1 — Pers(C')|s/IV) 


^®Note that combining the first and second set into a single set decreases modularity because the 
value of inter-layer coupling is too small to compensate for the decrease in intra-layer contributions 
to modularity. 
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Fig. 5.4. Comparison between the Louvain and LouvainRand algorithms. The sample of inter¬ 
layer coupling values is the set {0, 0.02,..., 0.98,1} with a discretization step of 0.02 between each 
pair of consecutive values. (a,b) The number of nodes that have have been merged with at least 
one node from the same layer after convergence of the first phase of (a) the Louvain heuristic and 
(b) the LouvainRand heuristic. For each heuristic, we average this value over \'T\ = 238 layers 
and 100 iterations. The error bars in panels (a,b) indicate standard deviations. (c,d) The value 
of 1 — Pers(C)\s/N averaged over 100 runs of (c) the Louvain heuristic and (d) the LouvainRand 
heuristic after convergence of the algorithms to a local optimum. 


between consecutive layers at w = Max(Bdiag) [see Fig. |5.4[ c)]. Nonzero values of 
(1 — Pers(C)|s/iV) indicate that community assignments have changed between layers 
s and s + 1 (by proposition 5.3). 

This problem manifests itself when the values of inter-layer coupling are large 
relative to the entries of Bdiag- In the correlation multilayer networks that we consider 
(or in unweighted multilayer networks), entries of the adjacency matrix satisfy \ Aijg \ < 
1. Assuming that one uses the modularity quality function on each layer and that 

P, 


Pijs > 0 (e.g. 


ijs = {A-s))-, this implies that 

Max(Bdiag) < 1 for all 7 € [7 


- 7 + 


For networks in which the modularity cost of changing intra-layer partitions in favor 
of persistence is large in comparison to the values of Max(Bdiag), it might be desirable 
to use a; > 1 to gain insight into a network’s multilayer community structure (e.g., 


this occurs in both toy examples of Section 5.1). 


To mitigate this problem, we change the condition for merging nodes in the Lou¬ 
vain heuristic. Instead of moving a node to a community that maximally increases 
modularity, we move a node to a community chosen uniformly at random from those 
that increase modularity. We call this heuristic LouvainRand [^, and we illustrate 
the results of using it in Figs. |5.4[ b,d). Although LouvainRand increases the out¬ 
put variability (by increasing the search space of the optimization process), it seems 
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Fig. 5.5. Numerical experiments with data set 1. We sample the set of inter-layer edge weights 
uniformly from the interval [0,50] with a discretization step of 0.1 (so there are 501 values of lj in 
total), and we use the uniform null network (i.e., Pijs = (-As}J with 7 = 1 . (a) The persistence 
normalized by A^(|T| — 1) for each value of u averaged over 20 runs of LouvainRand. (b) The 
intra-layer modularity contributions ^ ^js) 'formalized by ^ijs 

for each value of u averaged over 20 runs of LouvainRand. (d) Sample output multilayer partition. 
Each point on the horizontal axis represents a single time window, and each position on the vertical 
axis is an asset. We order the assets by asset class, and the colors represent communities, (e) 
Association matrix of normalized persistence values between all pairs of layers averaged over all 
values of LO £ [0, 50] in our sample and 20 runs for each value. The normalized persistence between 
a pair of layers {s,?'} is S{cis,Cir)/N. (f) Association matrix indicating the co-classification 

of nodes averaged over the set of partitions induced on each layer for each value of lu and 20 runs 
of LouvainRand. 


to mitigate the problem for multilayer networks with ordinal diagonal and uniform 
coupling. 

5.4. Multilayer community structure in asset correlation networks. In 

this section, we show the results of computational experiments in which we fix the 
value of the resolution parameter 7 and vary the value of inter-layer coupling oj. We 
use the uniform null network (i.e., Pijs = {A)s) and set 7 = 1. We use the Lou¬ 
vainRand heuristic to identify multilayer partitions and apply our post-processing 
procedure that increases persistence without changing partitions induced on individ¬ 
ual layer to all output multilayer partitions. We showed in proposition |5.5| that for 
2lo > ujoo = fV^[Max(Bdiag) — Min(Bdiag)], the set Cniax(i^) of global optima no longer 
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changes and every optimal partition in this set has maximal persistence}^ In our ex¬ 
ample, iV^[Max(Bdiag) — Min(iSdiag)] < 2N‘^, with N = 98. However, for the purposes 
of the present paper, we take the set {0,0.1,..., 0.49, 50} with a discretization step of 
0.1 between consecutive values (giving 501 values in total) as our sample of oj values. 

In agreement with the properties derived in propositions |5.7| and |5.8[ we observe 
in Fig. |5.5[ a) that normalized persistence (given by Pers((7)/[iV(|T| — 1)]) tends to 
be larger for larger values of inter-layer coupling, and in Fig. |5.5[ b) that intra-layer 
modularity contributions (which we normalize by t^nd to be smaller 

for larger values of inter-layer coupling. The increase of persistence and the decrease of 
intra-layer modularity contributions need not be monotonic, because we are a finding 
a set of local optima for each value of w rather than the set of global optima. 

In Fig. |5.5[ c), we show a sample output of the multilayer partition (which contains 
35 communities). Some of the changes in community structure correspond to known 
events (e.g., Lehman Bankruptcy in September 2008 [marked by an increase in the 
size of the equity asset class]). Observe that the two largest communities are the ones 
that contain the government bond assets and the equity assets. In particular, the 
community that contains equities becomes noticeably larger in 2006 and in 2008. For 
larger values of the resolution parameter, this community instead becomes noticeably 
larger only in 2008. (By inspecting the correlation matrices, one can check that the 
increase in correlation between equities and other assets is greater in 2008 than in 
2006.) 


In Fig. |5.5| [d), we show the matrix of mean values of persistence between all pairs 
of layers. The (s, r)‘*' entry is the term <5(ci,, Ci^), where s, r S {!,..., |T|} need 
not be from consecutive layers, averaged over nodes, all values of w G [0, 50] in our 
sample, and multiple runs for each value of oj. Instead of only plotting Pers(C')|s 
for consecutive layers. Fig. |5.5| gives some indication as to whether nodes change 
communities between layers s and s -I- I to join a community that contains a copy of 
some of these nodes from another time layer (i.e., , Ci^) ^ 0 for some r) 

or to join a community that does not contain a copy of t hese nodes in any other time 


layer (i.e., i c^) = 0 for all r). Figure 5.5 also gives some insight into 


whether there are sets of consecutive layers across which persistence values remain 
relatively large. This may shed light on when connectivity patterns change in a 
multilayer network. As indicated by the values on the color scale, the values of 
persistence in Fig. [5.5| [d) remain relatively high (which can partly be explained by 
the fact that equities and bonds remain in the same community across almost all 
layers, and these constitute roughly 50 % of the node sample). The most noticeable 
diagonal block separation in the middle of Fig. l5.5| [d) corresponds to the change in 
Fig. [5.5|[c) between 2005 and 2006, and the smaller diagonal block at the bottom right 
in Fig. |5.5| [d) corresponds to the change in Fig. |5.5| [c) after the Lehman Bankruptcy 
between 2008 and 2009. 

In Fig. 5.5 e), we show the co-classification index of nodes in partitions induced 
on individual layers, which we average over layers, all values of w G [0, 50] in our 
sample, and multiple runs for each value of u (we re-order the nodes to emphasize 
diagonal blocks in the association matrix). This figure yields insight into what sets 
of nodes belong to the same community across layers for increasing values of w. This 
may shed light on connectivity patterns that are shared across layers. Unsurprisingly, 


^®Note that there can also be smaller values of lJoo for which this is true; in other words, we did 
not show that tUoo is the smallest lower bound of the set {w : PersfC^j,^) = ^(|T| — 1) for all S 

dmax(u?)}. 
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the first diagonal block mainly corresponds to bond assets and the second diagonal 
block mainly corresponds to equity assets. Figures 5.5 ’d,e) complement each other: 
at a given resolution, the latter gives an idea about when community structure has 
changed, and the former gives an idea about how it has changed. 


6. Conclusions. Modularity maximization in temporal multilayer networks is a 
clustering technique that produces a time-evolving partition of nodes. We have inves¬ 
tigated two questions that arise when using this method: (1) the role of null networks 
in modularity maximization, and (2) the effect of inter-layer edges on the multilayer 
modularity-maximization problem. We demonstrated that one must be cautious in 
interpreting communities obtained with a null network in which the distribution of 
expected edge weights is sample-dependent. Furthermore, we showed that an opti¬ 
mal partition in multilayer modularity maximization reflects a trade-off between static 
community structure within layers and persistence of community structure across lay¬ 
ers. One can try to exploit this in practice to detect changes in connectivity patterns 
and shared connectivity in a time-dependent network. 


At the heart of modularity maximization is a comparison between what one antic¬ 
ipates and what one observes. The ability to specify what is anticipated is a desirable 
(albeit under-exploited) feature of modularity maximization, because one can explic¬ 
itly adapt it for different applications By defining a null model as a 

probability distribution over the space of adjacency matrices and a null network as 
the expected adjacency matrix under the specified distribution, we highlight the im¬ 
portant point that the same null network can correspond to different null models; this 
is not something that has been appreciated properly in the literature. Moreover, one 
needs to be very careful with one’s choice of null network because it determines what 
one regards as densely connected in a network: different choices in general yield dif¬ 
ferent communities. As we illustrated in Section]^ for financial correlation networks, 
this choice can have a large impact on results, and can lead to misleading conclusions, 
so one should be cautious when interpreting communities that one obtains with a null 
network in which the distribution of expected edge weights is sample-dependent. 


In Section we proved several properties that describe the effect of ordinal diago¬ 
nal and uniform inter-layer coupling on multilayer modularity maximization, or more 
generally, on any maximization problem that can be cast in the form (3.4|. Although 
our theoretical results do not necessarily apply to the local optima that one attains 
in practice, they do provide useful guidelines for how to interpret the outcome of a 
computational heuristic for maximizing modularity: if a multilayer partition is incon¬ 
sistent with one of the proven properties, then it must be an artifact of the heuristic 
and not a feature of the quality function. 


To further examine multilayer modularity maximization, we defined a measure 
that we called persistence to quantify how much community assignments change in 
time in a multilayer partition. For zero inter-layer coupling, the value of persistence 
is 0, and it achieves a maximum finite value for sufficiently large inter-layer coupling. 
We showed that the highest achievable value of persistence for an optimal partition 
obtained with a given value of inter-layer coupling w is an non-decreasing function in 
uj. Similarly, the highest achievable value of intra-layer contributions to the quality 
function for an optimal partition obtained with a given value of inter-layer coupling 
w is a non-increasing function in oj. The notion of persistence makes it possible to 
measure this trade-off between static community structure within layers and temporal 
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persistence across layers: 


iri N 

maxcec EE + 2a;Pers(C'). 

s—1 i,j — l 

We illustrated this trade-off in our numerical experiments. 

Finally, we showed that the Louvain heuristic can pose two issues when applied 
to multilayer networks with ordinal diagonal and uniform coupling. These can pro¬ 
duce misleading values of persistence (or other quantitative measures of a multilayer 
partition) and can cause one to draw false conclusions about temporal changes in 
community structure in a network. We proposed ways to mitigate these problems 
and showed several numerical experiments on real data as illustrations. To further 
interpret these results, one needs to investigate more closely how the increase in per¬ 
sistence and the decrease in intra-layer contributions to the quality function actually 
manifest in a multilayer partition between the “boundary cases”. This may help iden¬ 
tify an interval of to values in which the trade-off between these two quantities yields 
the most insights. 
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